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Abstract 



We consider the problem of estimating the slope parameter in functional linear re- 
gression, where scalar responses Y±, . . . ,Y n are modeled in dependence of second order 
stationary random functions X\ , . . . , X n . An orthogonal series estimator of the func- 
tional slope parameter with additional thresholding in the Fourier domain is proposed 
and its performance is measured with respect to a wide range of weighted risks covering 
as examples the mean squared prediction error and the mean integrated squared error 
for derivative estimation. In this paper the minimax optimal rate of convergence of the 
estimator is derived over a large class of different regularity spaces for the slope param- 
eter and of different link conditions for the covariance operator. These general results 
are illustrated by the particular example of the well-known Sobolev space of periodic 
functions as regularity space for the slope parameter and the case of finitely or infinitely 
smoothing covariance operator. 

Keywords: Orthogonal series estimation, Spectral cut-off, Derivatives estimation, 

Mean squared error of prediction, Minimax theory, Sobolev space. 
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1 Introduction 

Functional linear models have become very important in a diverse range of disciplines, in- 
cluding medicine, linguistics, chemometrics as well as econometrics (see for instance Ramsay 
and Silverman [2005] and Ferraty and Vieu [2006], for several case studies, or more specific, 
Forni and Reichlin [1998] and Preda and Saporta [2005] for applications in economics). 
Roughly speaking, in all these applications the dependence of a response variable Y on the 
variation of an explanatory random function X is modeled by 



for some error term e. One objective is then to estimate nonpar ametrically the slope function 
(3 based on an independent and identically distributed (i.i.d.) sample of (Y,X). 

In this paper we suppose that the random function X is taking its values in L 2 [0, 1], 
which is endowed with the usual inner product (•, •) and induced norm ||-||, and that X has 
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a finite second moment, i.e., E||X|| 2 < oo. In order to simplify notations we assume that 
the mean function of X is zero. Moreover, the random function X and the error term e 
are uncorrelated, where e is assumed to have mean zero and variance one. This situation 
has been considered, for example, in Cardot et al. [2003] or Miiller and Stadtmiiller [2005]. 
Then multiplying both sides in (1.1) by X{s) and taking the expectation leads to 



where g belongs to L 2 [0, 1] and T cov denotes the covariance operator associated to the 
random function X. Estimation of (3 is thus linked with the inversion of the covariance 
operator T cov of X and, hence called an inverse problem. We assume that there exists 
a unique solution (3 S L 2 [0, 1] of equation (1.2), i.e., g belongs to the range lZ(T cov ) of 
T cov , and T cov is injective. However, as usual in the context of inverse problems all the 
results below could also be obtained straightforward for the unique least-square solution 
with minimal norm, which exists if and only if g is contained in the direct sum of lZ(T cov ) 
and its orthogonal complement ^{T^) 1 - (for a definition and detailed discussion in the 
context of inverse problems see chapter 2.1 in Engl et al. [2000], while in the special case of 
a functional linear model we refer to Cardot et al. [2003]). 

The normal equation (1.2) is the continuous equivalent of a normal equation a KXY = 
EXX*/3" in a linear model "Y = X l j3 + e", where the covariance matrix "EJJ ( " has 
always a continuous generalized inverse. However, due to the finite second moment of X the 
covariance operator T cov of X defined in (1.2) is nuclear (c.f. Dauxois et al. [1982]). Thereby, 
unlike in the linear model, a continuous generalized inverse of T cov does not exist if the 
range of the operator T cov is an infinite dimensional subspace of L 2 [0, 1]. This corresponds 
to the setup of ill-posed inverse problems (with the additional difficulty that T cov in (1.2) 
is unknown and hence, has to be estimated). 

In the literature several approaches are proposed in order to circumvent the instability 
issue due to an inversion of T cov . Essentially, all of them replace the operator T cov in equation 
(1.2) by a regularized version having a continuous generalized inverse. A popular example 
is based on a functional principal components regression (c.f. Bosq [2000], Cardot et al. 
[2007] or Miiller and Stadtmiiller [2005]), which corresponds to a method called spectral 
cut-off in the numerical analysis literature (c.f. Tautenhahn [1996]). An other example is 
the Tikhonov regularization (c.f. Hall and Horowitz [2007]), where the regularized solution 
Pa is defined as unique minimizer of the Tikhonov functional F a {f3) = \\T cov f3 — g|| 2 + a||/3|| 2 
for some strictly positive a. A regularization through a penalized least squares approach 
after projection onto some basis (such as splines) is also considered in Ramsay and Dalzell 
[1991], Eilers and Marx [1996] or Cardot et al. [2003]. 

In opposite to the model assumptions considered until now in the literature in this paper 
we suppose that the regressor X is second order stationary. Over relatively short periods of 
time, the assumption of second order stationarity is in many situations realistic and can be 
checked from the data by estimating the covariance function using the multiple realizations 
of X. Moreover, assuming second order stationarity allows us to generalize the known 
results in essentially two directions. First, we can unify the measures of performances for 
the estimator as considered in the literature and second it is possible to present a simple 
estimation strategy which is optimal in a minimax sense over a wide range of possible 
regularity spaces for the slope functions (3 as well as various forms of link conditions for the 
covariance operators T cov . To be more detailed: 
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In this paper we show that in case of second order stationary regressor X the associated 
covariance operator T cov admits a spectral decomposition 1} given by the 

trigonometric basis {ipj} (defined below) as eigenfunctions and a strictly positive, possibly 
not ordered, zero-sequence A := (Aj)j^i of corresponding eigenvalues. Then the normal 
equation can be rewritten as follows 

oo 

= y • ^ with 9j ■■= <9> 3 > I- (1-3) 

It is well-known that even in case of an a-priori known sequence A of eigenvalues replacing 
in (1.3) the unknown function g by a consistent estimator g does in general not lead to a In- 
consistent estimator of j3. To be more precise, since A is a zero-sequence, E[|<7 — g|| 2 = o(l) 
does generally not imply Y^jLi " E|(g — g,^.?)! 2 = °(1)> i-e., the inverse operation of 
the covariance operator T cov is not continuous. Essentially, all of the approaches mentioned 
above circumvent this instability issue by replacing equation (1.3) by a regularized version 
which avoids that the denominator becomes too small. For instance, in case of a Tikhonov 
regularization (c.f. Hall and Horowitz [2007]) in (1.3) the factor 1/Xj is replaced by A,/(a+ 

In the literature so far the performance of an estimator of (3 has been measured either 
by considering a squared prediction error or an integrated squared error. We show in this 
paper that these approaches can be unified by considering a loss given by a weighted norm. 
To be more precise for / 6 L 2 [0, 1], we define 

oo 

ll/l^=X>il(/,ifo>l a (1-4) 

3=1 

for some strictly positive sequence of weights u := (wj)j^i- Then, the performance of an 
estimator of f3 is measured by the J-^-risk, that is E||/3 — /3|| 2 . This general framework 
allows us with an appropriate choice of the weight sequence u to cover both, the risk in 
terms of mean integrated squared error, i.e., u = 1, as well as the mean squared prediction 
error. Indeed, the squared prediction error of a new value of Y given any random function 
X n+ \ possessing the same distribution as X and being independent of X\ , . . . , X n can be 
evaluated as follows (see for example Cardot et al. [2003] or Crambes et al. [2009] for similar 
setups) 



E 



(P,X n+ i)-((3,X n+l ) I3\ = {T cov - - (3)) = ^\ J \0 - (3,i; 3 )f 

3>1 



where we have used for the last identity that the regressor is second order stationary, i.e, 
T cov admits {Xj,ipj,j ^ 1} as spectral decomposition. Consequently, choosing to = A the 
J^-risk is equivalent to the mean squared prediction error. We present this specific situation 
in Section 4 below. It is worth to note, that the L 2 -norm ||/^|| of the s-th weak derivative 
/( s ) of a function /, if it exists, is also equivalently given by a specific weighted norm ||-|| 
with an appropriate choice of weights u (c.f. Neubauer [1988a]). Thus, by considering the 
corresponding J^-risk we also cover the estimation of derivatives of the slope function. This 
question is also discussed in detail in Section 4. 

In this paper we characterize the a-priori information on the slope parameter such as 
smoothness by considering ellipsoids (see definition below) in L 2 [0,1] with respect to a 
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weighted norm ||-[| for a pre-specified weight sequence 7. Again an appropriate choice of 
the sequence 7 enables us not only to restrict the slope parameter to a class of differentiable 
functions (considered, e.g. in Crambes et al. [2009]) but, for instance, also to a class of 
analytic functions. Moreover, it is usually assumed that the sequence A of eigenvalues 
of T cov has a polynomial decay (c.f. Hall and Horowitz [2007] or Crambes et al. [2009]). 
However, it is well-known that this restriction may exclude several interesting cases, such as 
an exponential decay. Therefore, we do not impose a specific form of a decay, but consider 
a third sequence of weights v characterizing the decay of A. Then we show that the three 
sequences 7 (regularity of /?), v (regularity of T cov ) and u (measure of the performance of the 
estimator) determine together the obtainable accuracy of any estimator. In other words, in 
Section 3 we derive a lower bound under minimal regularity conditions on these sequences. 
It is remarkable, that a simple orthogonal series estimator attains this lower bound up to a 
constant under very mild moment assumptions on the regressor and the error term. 

To be more precise, we replace the unknown quantities gj and Xj in equation (1.3) by 
their empirical counterparts. That is, if (Y\,Xi), . . . , (Y n ,X n ) denotes an i.i.d. sample of 
(Y, X), then for each j ^ 1, we consider the unbiased estimator 

1 n 1 n 

n i=l 71 i=l 

for gj and Xj respectively. The orthogonal series estimator (5 of /3 is then defined by 

m ^ 

/9:=Ef--lft>a}-Vi, (1.6) 

j=l X 3 

where the dimension parameter m = m{n) and the threshold a = a{n) has to tend to 
infinite and zero respectively as the sample size n increases. Note that we introduce an 
additional threshold a on each estimated eigenvalue Xj , since it could be arbitrarily close to 
zero even in case that the true eigenvalue Aj is sufficiently far away from zero. Thresholding 
in the Fourier domain has been used, for example, in a deconvolution problem in Mair and 
Ruymgaart [1996], Neumann [1997] or Johannes [2009] and coincides with an approach 
called spectral cut-off in the numerical analysis literature (c.f. Tautenhahn [1996]). 

The paper is organized in the following way. In Section 2 we formalize the regularity 
conditions on the slope parameter (3 and the covariance operator T cov characterized through 
different weight sequences. Moreover, we state the minimal conditions on these weight 
sequences as well as the moments of the random function X and the error term e used 
throughout the paper. In Section 3 we show consistency in the J-^-iisk of the proposed 
orthogonal series estimator under very mild assumptions. For example, considering the 
L 2 -risk, i.e., u = 1, there are no additional regularity conditions on the slope parameter 
needed. Furthermore, we derive a lower and an upper bound for the J-^-risk only supposing 
the minimal conditions on the sequences 7, lo and v. These results are illustrated in Section 
4 by considering the mean squared prediction error as well as the optimal estimation of 
derivatives of (5 in case that the slope function belongs to a Sobolev space of periodic 
functions and that the covariance operator T cov is finitely or infinitely smoothing. All 
proofs can be found in the Appendix. 
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2 Notations and basic assumptions 



Second order stationarity. In this paper we suppose that the regressor X is second 
order stationary, i.e., there exists a positive definite function c : [—1, 1] — ► K such that 
cov(X(t), X(s)) = c(t — s), s,t £ [0, 1]. Thereby we show in Proposition A.l in the Appendix 
that the eigenfunctions of the covariance operator T cov associated to X are given by the 
trigonometric basis 

1>i := 1, ^aj(s) := cos(2^js), ^ i+ i(a) := sin(2vrjs), s E [0, 1], j € N (2.1) 
and the corresponding eigenvalues satisfy 

Ai = J c(s)ds, \2j = ^2j+i = J cos(2-7rj s)c(s)ds, j £ N. (2-2) 

Notice that the eigenfunctions are known to the statistician and only the eigenvalues depend 
on the unknown covariance function c(-), i.e., have to be estimated. 



Minimal regularity conditions. It is well-known that the obtainable accuracy of any 
estimator of the slope parameter (3 is essentially determined by additional regularity con- 
ditions imposed on both the slope parameter (3 and the sequence of eigenvalues (Aj) of 
the covariance operator. In this paper these conditions are characterized through different 
weighted norms in L 2 [0,1], which we formalize now. Given a strictly positive sequence of 
weights w := (wj)j-^i and a constant c > denote for all r £ M by T^r the ellipsoid given 
by 

oo 

Kr := {/ € L 2 [0,1] : j>JK/,^>| a =: \\f\? w r < c}. 

3=1 

Furthermore, let T w r := {/ £ L 2 [0, 1] : H/H^r < oo}. Here and subsequently, we suppose 
that given a strictly positive sequence of weights 7 := (7j)j^i the slope function (3 belongs 
to the ellipsoid for some p > 0. The ellipsoid J-Jy captures then all the prior information 
(such as smoothness) about the unknown slope function 0. It is worth to note, that in case 
7 = 1 the set denotes an ellipsoid in L 2 [0,1] and hence does not imposes additional 
restrictions on (3. Furthermore, given a strictly positive sequence of weights v := (vj)j-^i we 
assume that the sequence of eigenvalues (Xj)j of the covariance operator T cov is an element 
of the set 5^ defined for d ^ 1 by 

^-{(Xjh^i-l/d^Xj/vj^d, VjGN}. (2.3) 

Notice that the sequence of eigenvalues is summable, since ^j g pj Aj = E||X|| 2 < 00. 

Therefore, the sequence v has also to be summable. We consider this quite general class 
of eigenvalues first. However, we illustrate condition (2.3) in Section 4 below by assuming 
a "regular decay" of the eigenvalues. Moreover, consider a strictly positive sequence of 
weights <jj := (ujj)j^i. Then we shall measure the performance of an estimator (3 of (3 by 
the J-^-risk, that is E||/3 — /?|| 2 . In Section 4 this approach is illustrated by considering 
different weight sequences u. Roughly speaking, an appropriate choice of oj enables us to 
cover both the estimation of derivatives of (3 as well as the optimal estimation in terms of 
the mean prediction error. Finally, all the results below are derived under the following 
minimal regularity conditions. 
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Assumption 2.1. Let uj := (wj)j>\, 7 := (lj)j>i an d v := (vj)j^i be strictly positive 
sequences of weights with u± = 1, 71 = 1 and wi = 1 such that 7 and are 
nondecreasing and v is nonincreasing with A := . < 00. 

Note that under Assumption 2.1 the ellipsoid .F-y is a subset of and hence the J-^-risk 
a well-defined risk for /3. Roughly speaking, if describes p-times differentiable functions, 
then the Assumption 2.1 ensures that the J-^-risk involves maximal s ^ p derivatives. 



Moment assumptions. The results derived below involve additional conditions on the 
moments of the random function X and the error term e, which we formalize now. Let X 
be the set of all centered second order stationary random functions X with finite second 
moment, i.e., E||X|| 2 < 00, and strictly positive covariance operator. Then given X G X 



the random variables {(X , tfij) / \f\j , j G N} are centered with variance one and moreover 
pairwise uncorrelated. Here and subsequently, X™, m G N, 77 ^ 1, denotes the subset of 
X containing all random functions X such that the m-th moment of the corresponding 
standardized random variables {(X,ipj) / \/X~j, j G N} are uniformly bounded, that is 



X™ := ix G X with supE 



fx, 



III 



n 



'}■ 



(2.4) 



It is worth noting that in case X £ X is a Gaussian random function the corresponding 
random variables {(X, ipj) / y/Xj, j G N} form an i.i.d. sample of Gaussian random variables 
with mean zero and variance one. Hence, for each k G N there exists n such that any 
Gaussian random function X G X belongs also to X^. In what follows, £™ stands for the 
set of all centered error terms e with variance one and finite m-th moment, i.e., E|e| m n. 



3 Optimality in the general case 

Consistency. The J-^-risk of the estimator [3 given in (1.6) is essentially determined by 
the deviation of the estimators of (gj)j and (Xj)j and by the regularization error due to the 
threshold. The next assertion summarizes minimal conditions to ensure consistency of the 
estimator defined in (1.6). 

Proposition 3.1 (Consistency). Assume an n-sample of(Y,X) satisfying (1.1) with a > 0. 
Let j3 G Fry, X G X* and e G £i , n ^ 1. Consider the estimator f3 with threshold m := m(n) 
and parameter a := a(n) satisfying m — > 00, a = o(l) and (sup^ m u>j)(na 2 ) = o(l) as 
n — ► 00. If in addition 7 and uj satisfy Assumption 2.1, then E[|/3 — /3|| 2 = o(l) as n —> 00. 

Remark 3.1. Since the last result covers the case 7 = uj = 1 it follows that the estimator (3 
is consistent without any additional restriction on G L 2 [0, 1] provided m — > 00, a = o(l) 
and na 2 — > 00 as n — > 00. □ 



The lower bound. It is well-known that in general the hardest one-dimensional subprob- 
lem does not capture the full difficulty in estimating the solution of an inverse problem even 
in case of a known operator (for details see e.g. the proof in Mair and Ruymgaart [1996]). 
In other words, there does not exist two sequences of slope functions f3i in ,02,n G ^71 which 
are statistically not consistently distinguishable and satisfy \\Pi, n — (h,n\\w ^ C^n> where 5* 
is the optimal rate of convergence. Therefore we need to consider subsets of J-§ with grow- 
ing number of elements in order to get the optimal lower bound. More specific, we obtain 
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the following lower bound by applying Assouad's cube technique (see e.g. Korostolev and 
Tsybakov [1993] or Chen and Reifi [2008]) under the additional assumption that the error 
term e is standard normal distributed, i.e., e ~ AA(0, 1), and independent of the regressor. 

Theorem 3.2. Assume an n-sample of (Y, X) obeying (1.1) with a > 0. Suppose that 
the error term e ~ AA(0, 1) is independent of the second order stationary regressor X with 
associated sequence of eigenvalues (Xj) E S^. Consider J 7 ^, p > 0, as set of slope functions. 
Let m* := m*(n) E N and <5* := 5*(m*) G M + for some A ^ 1 be chosen such that 

l/A^^^V^A and 8* n := u> m J lm# . (3.1) 

7/ in addition the Assumption 2.1 is satisfied then for any estimator (3 we have 
sup {e||3-/9||1} > ^n,in(g,£) max(j;,l/n). 

Remark 3.2. The normality assumption in the last theorem is only used to simplify the 
calculation of the distance between distributions corresponding to different slope functions. 
Obviously the derived lower bound is still valid if we consider the less restrictive assumption 
that the error term e belongs to £™ for some m G N and sufficiently large n. Furthermore, 
it is worth to note that the lower bound tends only to zero if (ujj/'jj) is a zero sequence. 
In other words, in case 7 = 1, i.e., without any additional restriction on [3 E L 2 [0, 1], 
uniform consistency over L 2 [0, 1] in the J-^-risk is only possible if the weighted norm 
is weaker than the usual L 2 -norm, that is, u is a zero sequence. This obviously reflects the 
ill-posedness of the underlying inverse problem. □ 



The upper bound. The next theorem states that the rate max(<5*,l/n) of the lower 
bound given in Theorem 3.2 provides also an upper bound of the proposed estimator (3. 
Therefore the rate max(5*, 1/n) is optimal and hence the estimator (3 is minimax-optimal. 

Theorem 3.3. Assume an n-sample of (Y,X) satisfying (1.1) with a > 0. Suppose that the 
regressor X is second order stationary with associated sequence of eigenvalues (Xj) G S^j. 
Consider m* := m*(n) and <5* := 5^(n) given in (3.1) for some A ^ 1. Let [3 be the 
estimator defined in (1.6) with m := m* and a := (1/n) min(l, 7 m>fc /(2(iA)). If in addition 
X G X^ k and £^ k , k ^ 4, then for some generic constant C > we have 

sup {EH/3-/3H 2 ) ^ Cd 5 A 3 r][pdA + a 2 } max(5*,l/n), 
for all sequences 7, uj and v satisfying Assumption 2.1. 

Remark 3.3. It is worth to note that the bound derived in the last theorem is non asymp- 
totic. Furthermore, as in case of the lower bound (see Remark 3.2) also the upper bound 
tends only to zero, if (ujj/jj) is a zero sequence. Therefore the estimator (3 is consistent 
even without any additional restriction on (3 G L 2 [0,1], i.e., 7 = 1, as long as w is a zero 
sequence. We shall stress that from Theorem 3.3 follows that for all sequences 7, u and v 
satisfying the minimal regularity Assumption 2.1 the orthogonal series estimator (3 attains 
the optimal rate max(<5*, 1/n) and hence is minimax-optimal. In particular, it is easily seen 
that the optimal rate max(<5*,l/n) is parametric if and only if Yl'jLi UJ j/ v j < 00 • Hence, 
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in this case the rate of the orthogonal series estimator (3 is parametric again without any 
additional restriction on j3 G L 2 [0,1], i.e., 7 = 1. Finally as long as the sequence 7 is 
unbounded in Theorem 3.3 the threshold parameter a satisfies a = 1/n for all sufficiently 
large n. Thus in this situation as open problem remains only how to choose the dimension 
parameter m adaptively from the data. We are currently exploring this issue. □ 

4 Mean prediction error and derivative estimation 

In this section we suppose the slope function (3 is an element of the Sobolev space of periodic 
functions W p given for p > by 

W p = {/ G H s : /C*>(0) = f^(l), j = 0, 1, . . . ,p - l}, 

where H p := {/ G L 2 [0, 1] : f^ 1 ) absolutely continuous , /&0 G L 2 [0, 1]} is a Sobolev space 
(c.f. Neubauer [1988a, b], Mair and Ruymgaart [1996] or Tsybakov [2004]). However, if we 
consider the sequence of weights (w p )j^ given by 

w{ = 1 and w p 2j = w p 2j+1 = j 2p , j G N. (4.1) 

Then the Sobolev space W p of periodic functions is equivalently given by T w v. Therefore, 
let us denote by Wp := F^pi P > 0, an ellipsoid in the Sobolev space W p . We use in case 
p = again the convention that Wp denotes an ellipsoid in L 2 [0, 1]. 

Mean prediction error. We shall first measure the performance of an estimator (3 by the 
mean prediction error (MPE), i.e., E(T cov (/3 — f3), (f3 — /?)). Consequently, if the sequence of 
eigenvalues (Xj) associated to the covariance operator T cov satisfies a link condition, that is 
(Xj) G S£ for some weight sequence v (see definition (2.3)). Then the MPE is equivalent to 
the .F^-risk with u = v, that is E||/3— /3|| 2 x d E(T cov (/3— (3), (3— (3). To illustrate the previous 
results we assume in the following the sequence v to be either polynomially decreasing, i.e., 
vi = 1 and Vj = |j'| _2a , j 2, for some a > 1/2, or exponentially decreasing, i.e., v% = 1 and 
Vj = exp(— |j| 2a ), j ^ 2, for some a > 0. In the polynomial case easy calculus shows that a 
covariance operator T cov with eigenvalues (Xj) G S^, i.e., Aj x rf |j|~ 2a , acts like integrating 
(2a)-times and hence it is called finitely smoothing (c.f. Natterer [1984]). This is the case 
considered, for example, in Crambes et al. [2009]. On the other hand in the exponential 
case it can easily be seen that the link condition (Xj) G <S„, i.e., Xj x d exp(— j 2a ), implies 
7Z.(T C ov) G W s for all s > 0, therefore the operator T cov is called infinitely smoothing (c.f. 
Mair [1994]). Since in both cases the minimal regularity conditions given in Assumption 2.1 
are satisfied, the lower bounds presented in the next assertion follow directly from Theorem 
3.2. Here and subsequently, we write a n < b n when there exists C > such that a n ^ C b n 
for all sufficiently large n G N and a n ~ b n when a n < b n and b n < a n simultaneously. 

Proposition 4.1. Under the assumptions of Theorem 3.2 we have for any estimator (3 
(i) in the polynomial case, i.e. v 1 = 1 and Vj = \j\ , j ^ 2, for some a > 1/2, that 

SU P/JeW p{E (T cov (/3 - /?), (/3 - /?))} > n -(2 P +2a)/(2 P+ 2a + l) ; 

(^iij in t/ie exponential case, i.e. v\ = 1 and Vj = exp(— |j| 2a ), j ^ 2, /or some a > 0, t/iat 
su P/3eW p{E(r cov (/3 - (3), -/?))}> n- x (\ogn) l ' 2a . 
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On the other hand, if the dimension parameter m and the threshold a in the definition 
of the estimator (3 given in (1.6) are chosen appropriate, then by applying Theorem 3.3 the 
rates of the lower bound given in the last assertion provide up to a constant also the upper 
bound of the risk of the estimator (3, which is summarized in the next proposition. We have 
thus proved that these rates are optimal and the proposed estimator [3 is minimax optimal 
in both cases. 

Proposition 4.2. Under the assumptions of Theorem 3.3 consider the estimator (3 

(i) in the polynomial case, i.e. v i = 1 and Vj = \j\~ 2a , j ^ 2, for some a > 1/2, with 
dimension m ~ n l ^ 2p+2a+1 " > and threshold a ~ 1/n. Then we have 

SU P/3eW p{E(T cov 03 - f3), {{3- (3))} < n -(2p+2a)/(2 P+ 2a+l) ; 

(ii) in the exponential case, i.e. v\ = 1 and Vj = exp(— |j| 2a ), j ^ 2, for some a > 0, with 
dimension m ~ (log n) 1 ^ 2 ^ and threshold a ~ 1/n. Then 

su P/3eW P {E (t cov - 0), - /?)) } < n" 1 (log nfl 2a . 

Remark 4.1. It is of interest to compare our results with those of Crambes et al. [2009] 
who measure the performance of their estimator in terms of the prediction error. In their 
notations the decrease of the eigenvalues of T cov is assumed to be of order (lil" 2 * 3-1 ), i.e., 
q = a — 1/2. Furthermore they suppose the slope function to be m-times continuously 
differentiable, i.e., m = p. By using this reparametrization we see that our results in the 
polynomial case imply the same rate of convergence in probability of the prediction error as 
it is presented in Crambes et al. [2009]. However, from our general results follows a lower 
and an upper bound of the MPE not only in the polynomial case but also in the exponential 
case. 

Furthermore, we shall emphasize the interesting influence of the parameters p and a 
characterizing the smoothness of (3 and the decay of the eigenvalues of T cov , respectively. 
As we see from Propositions 4.1 and 4.2, in the polynomial case an increasing value of p 
leads to a faster optimal rate. In other words, as expected, a smoother regression function 
can be faster estimated. The situation in the exponential case is extremely different. It 
seems rather surprising that, contrary to the polynomial case, in the exponential case the 
optimal rate of convergence does not depend on the value of p, however this dependence 
is clearly hidden in the constant. Furthermore, the dimension parameter m does not even 
depend on the value of p. Thereby, the proposed estimator is automatically adaptive, i.e., 
it does not involve an a-priori knowledge of the degree of smoothness of the slope function 
(3. However, the choice of the dimension parameter depends on the value a specifying the 
decay of the eigenvalues of T cov . Note further that in both cases an increasing value of a 
leads to a faster optimal rate of convergence, i.e., we may call 1/a degree of ill-posedness 
(c.f. Natterer [1984]). Finally, we shall stress that Proposition 4.2 covers the case p = 0, 
i.e., (3 is consistent with optimal MPE-rate without additional restrictions on (3 G L 2 [0, 1].D 

Estimation of the derivatives. Let us consider now the estimation of derivatives of the 
slope function (3. It is well-known, that for any function g belonging to a Sobolev-ellipsoid 
Wp = J-^p with weights w p given in (4.1) the weighted norm H^H^s for each ^ s ^ p 
is equivalent to the L 2 -norm of the s-th weak derivative g( s > , that is, ||</^|| ^(2vr) 2s ll<?l|w s - 
Thereby, the results in the Section 3 imply again a lower bound as well as an upper bound 
of the L 2 -risk for the estimation of the s-th weak derivative of j3. In the following we 
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consider again the two particular cases of polynomial and exponential decreasing rates for 
the sequence of weights (vj). The next assertion summarizes then lower bounds for the 
L 2 -risk for the estimation of the s-th weak derivative (3^ of (3 in both cases. 

Proposition 4.3. Under the assumptions of Theorem 3.2 we have for any estimator 

(i) in the polynomial case, i.e. v\ = 1 and Vj = \j\~ 2a , j ^ 2, for some a > 1/2, that 

SU P/3eWpP {Ep( s ) - P^\\ 2 } > n -(2p-2,)/(2 P+ 2a+l) ; 

(ii) in the exponential case, i.e. V\ = 1 and Vj = exp(— |j| 2a ), j 2, for some a > 0, that 

su P/3eW p{E||^) -P^\\ 2 } > (logny^y-. 

On the other hand considering the estimator (3 given in (1.6), we only have to calculate 
the s-th derivative of (3. However, given the exponential basis, which is linked to the 
trigonometric basis by the relation exp(2Mrfci) = 2 _1 / 2 (V>2fc(i) + £ i>2k+i{t)), for l:6Z and 
t G [0, 1], with i? = —1, then for ^ s < p the s-th derivative (3^ of (3 in a weak sense is 

/?M(i) = V(2tvrfc) s ( [ 0(u) exp(-2ivr£;u) du) exp(2iirkt), t G [0, 1]. (4.2) 

Note, that the sum in (4.2) contains only a finite number of nonzero summands and hence its 
numerical implementation is straightforward. Furthermore, if the dimension parameter m 
and the threshold a in the definition of the estimator (3 given in (1.6) are chosen appropriate, 
then by applying Theorem 3.3 the rates of the lower bound given in the last assertion 
provide up to a constant again the upper bound of the L 2 -risk of the estimator f3^ s \ which 
is summarized in the next proposition. We have thus proved that these rates are optimal 
and the proposed estimator j3^ s ' is minimax optimal in both cases. 
Proposition 4.4. Under the assumptions of Theorem 3.3 consider the estimator (3^ 

(i) in the polynomial case, i.e. v\ = 1 and Vj = |j| _2a , j ^ 2, for some a > 1/2, with 
m ~ n i/(2p+2a+i) anc [ tJi res fiold a ~ n. Then 

SU P/3eW p{E||^) - /?(*)|| 2 } < n -(2p-2 S )/(2 P+2a+ l) ; 

(ii) in the exponential case, i.e. v\ = 1 and Vj = exp(— |j| 2a ), j 2, for some a > 0, with 
m ~ (logra) 1 /( 2a ) and threshold a ~ n. Then 

su P/3eW p{E||^) ~(3^\\ 2 } < (logn)-(P- s )/«. 

Remark 4.2. It is worth noting that the L 2 -risk in estimating the slope function (3 itself, 
i.e., s = 0, has been considered in Hall and Horowitz [2007] only in the polynomial case. 
In their notations the decrease of the eigenvalues of T cov is of order (|j| _a ), i.e., a = 2a. 
Furthermore the Fourier coefficients of the slope function decay at least with rate i.e., 
(3 = p + 1/2. By using this reparametrization we see that we recover the result of Hall and 
Horowitz [2007] in the polynomial case with s = 0, but without the additional assumption 
P > a/2 + 1 or > a- 1/2. 

Furthermore, we shall discuss again the influence of the parameters p, s and a. As 
we see from Propositions 4.3 and 4.4, in both cases an decreasing of the value of a or an 
increasing of the value p leads to a faster optimal rate of convergence. Hence, in opposite to 
the MPE by considering the L 2 -risk the parameter a describes in both cases the degree of 
ill-posedness. Furthermore, the estimation of higher derivatives of the slope function, i.e. by 
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considering a larger value of s, is as usual only possible with a slower optimal rate. Finally, 
as for the MPE in the exponential case the dimension parameter m does not depend on the 
values of p or s, hence the proposed estimator is automatically adaptive. □ 

Remark 4.3. There is an interesting issue hidden in the parametrization we have chosen. 
Consider a classical indirect regression model with known operator given by T cov , i.e., Y = 
[T cov f3](U) + s where U has a uniform distribution on [0, 1] and e is white noise (for details 
see e.g. Mair and Ruymgaart [1996]). If in addition the operator T cov is finitely smoothing, 
i.e., (vj) is polynomially decreasing with Vj = j~ 2a , j ?S 2. Then given an n-sample 
of Y the optimal rate of convergence of the L 2 -risk of any estimator of (3^ is of order 
n -2(p- s )/[2(p+2a)+i] ; gince ti(T cov ) = W 2a (cf. Mair and Ruymgaart [1996] or Chen and 
Reifi [2008]). However, we have shown that in a functional linear model even with estimated 
operator the optimal rate is of order n~' i ^ p ~ s ''^ 2 '^ p+a '^K Thus comparing both rates we see 
that in a functional linear model the covariance operator T cov has the degree of ill-posedness 
a while the same operator has in the indirect regression model a degree of ill-posedness 
(2a). In other words in a functional linear model we do not face the complexity of an 
inversion of T cov but only of its square root T^ 1 / 2 . This, roughly speaking, may be seen 

1 /2 

as a multiplication of the normal equation YX = (/3, X)X + Xe by the inverse of T cov . 
Notice that T cov is also the covariance operator associated to the error term eX. Thus the 

1/2 

multiplication by the inverse of T C q V leads, roughly speaking, to white noise and hence to 
an indirect regression model rather defined by Tcov than T cov . The same finding holds true 
in case of an infinitely smoothing operator T cov . However, in this situation (logra) - ( p ~ s )/ a 

1/2 

is the optimal rate in an indirect regression model given by T cov as well as T cov . Thus, the 
above described effect is not visible formally, but is actually hidden in the order symbol. □ 



A Appendix 

PROPOSITION A.l. Let X be second order stationary with E[X(t)X(s)] = c(t — s), t,s € 
[0,1], for some positive definite function c : [—1,1] — > M. Then the associated covariance 
operator T cov admits an eigenvalue decomposition with eigenf unctions given by the trigono- 
metric basis defined in (2.1) and corresponding eigenvalues given by (2.2). 

PROOF. Let / e L 2 [0, 1] and consider g = T cov f = Jq f{t)c(- — t)dt. Since c is even, 
it is straightforward to show that J* g(s)e~ tsX ds = J f(s)e~ tsX ds f^_ 1 c(s) cos(s\)ds and 

Jq g(s)e lsX ds = J X f(s)e lsX ds c(s) cos(sA)<is for all A £ 1. Due to this we obtain for all 
A 6 M the following identities 

i ,i ,1 

g(s) cos(s\)ds = / f(s) cos(sX)ds / c(s) cos(sX)ds, 
o Jo J-i 

/ g(s) sin(sA)ds = / f(s) sin(sA)ds / c(s) cos(sA)ds. 
Jo Jo J-l 

Consider the trigonometric basis {ipn} and the values {A n } given in (2.1) and (2.2), respec- 
tively, then we have just shown, that (T cov /, ip n ) = (/, ip n )X n for all / E L 2 [0, 1] and n G N, 
which proves the result. □ 



11 



A.l Proofs of Section 3 

We begin by denning and recalling notations to be used in the proofs: 

1 n 

Xij := (Xirfj), 0j = (p,i/ij), T nJ := ~ XfjPi), A,- = EX 

Tl . 

i=l 



Pm := Yl fa ■ H^i > «} • Pm := Yl Pj ■ ( A - X ) 

3 = 1 j=l 

We shall prove in the end of this section two technical Lemma (A. 2 - A. 3) which are used 
in the following proofs. 

Proof of consistency. 

Proof of Proposition 3.1. The proof is based on the decomposition 

np - p\\i ^m\\p -Pra\\l+n\p m - pfj- (a.2) 

We show below under the moment condition X £ X* defined in (2.4) and e G for some 
universal constant C > the following bound 



MP - Pm\t < C (sup uij) (na 2 )- 1 E\\X\\ 2 {a 2 + ||/3|| 2 E||X|| 2 } rj, (A.3) 



while given \\(3\\ u < oo we conclude from Lebesgue's dominated convergence theorem 

E||/3 m — = o(l) in case that 1/m = o(l), a = o(l) asn-> oo. (A. 4) 

Thereby, the conditions on m and a ensure the convergence to zero of the two terms on the 
right hand side in (A.2) as n — * oo, which gives the result. 

Proof of (A.3). By making use of the notations given in (A.l) it follows that 

i=l X 3 a j=l 

and hence by using (A. 10) in Lemma A.2 we obtain (A.3). 
The proof of (A. 4) is based on the decomposition 



np m - 



oo m oo 

< 2{j2"jP] 1 {3 > ™} + J2"jPj P & < «)} < 2 Y,^Pj = wpwi < °°- 



j=l j=l j=l 



Thus Lebesgue's dominated convergence theorem implies the result since in case 1/m = o(l) 
and q = o(l) as n — * oo for each j E N l{j > m} = and P(Xj < a) = o(l), which can be 
realized as follows. By using that a = o(l) as n — > oo there exists rij > such that for all 
n ^ rij it holds Xj 2a and hence P(Xj < a) ^ P(Xj/Xj < 1/2) together with (A. 12) in 
Lemma A.2 implies the assertion, which completes the proof. □ 
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Proof of the lower bound. 

PROOF of Theorem 3.2. Let Xi, i £ N, be i.i.d. copies of X which is second order 
stationary with associated sequence of eigenvalues £ <S„. Consider independent 

error terms £j ~ M(0, 1), i £ N, which are independent of the random functions {Aj}. Let 
9 £ {—1, l} m * , where m* := m*(rt) € N satisfies (3.1) for some A 1. Consider the m*- 
vector 6 of coefficients £>j given in (A. 15) in Lemma A. 3. For each 9 define a slope function 
f3g := ]Cj=i Qjbjtpj which belongs to due to (A. 16) in Lemma A. 3. Consequently, for 
each 9 the random variables (Y^,Xj) with Yj := Pg(s)Xi(s)ds + erEj, i = 1, ... ,n, form 
a sample of the model (1.1) and we denote its joint distribution by Pq. Furthermore, for 
j = 1, . . . , and each 9 we introduce 9^ by 0^ = #/ for j ^ I and #j = —9j. As in case 
of Pq the conditional distribution of Y% given X{ is Gaussian with mean X^J^i QjbjXij and 
variance cr 2 it is easily seen that the log-likelihood of Pgu) w.r.t. Pq is given by 



log (^f ) = - E - A E ^ 

--1 Z=l i=l 



and its expectation w.r.t. Pq satisfies Ep e [log(dP e (j) / dPg)\ = —(2n/a 2 ) b 2 EX 2 ; . In terms of 
Kullback-Leibler divergence this means KL(P eU) ,Pg) = (2n/a 2 ) b 2 EX 2 ^ < (2 dn/a 2 ) b 2 Vj 
by using that (Aj) j^i G 5^. Since the Hellinger distance H(P 6 (j) , P#) satisfies H 2 (P e ( 3 ) , Pg) ^ 
KL(P e Q),Pg) it follows from (A. 16) in Lemma A. 3 that 

H 2 (P eUh P e )^ 2d f-b 2 -v J ^l, j = l,...,m*. (A.5) 



cr 



Consider the Hellinger affinity p(Pgtj) > Pfl) = / v/ dPg(j)dP$, then for any estimator /3 follows 

Due to the identity p(Pg(j),Pg) = 1 — ^H 2 (P e (j), Pq) combining (A.5) with (A. 6) yields 

[E gU) \(P - P^,^ 2 + E e \0 - (3q,^)\ 2 } > h 2 , j = l,...,m*. 

From this we conclude for each estimator (3 that 
supE||^-/?|| 2 ^ sup Ee\W- 



6»e{-i,i} r 



>^ E E^w-/^->i 2 

6e{-l,i} m * j=i 

- {^e \0 - Pe, ipj) I 2 + %;) 1 - (3 eU) , ^} | 2 } 



1 V 



0e{-i,i} m * j'=i 

A' 



where the last inequality follows again from (A. 16) in Lemma A. 3, which completes the 
proof. □ 
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Proof of the upper bound. 

Proof of Theorem 3.3. The proof is based on the decomposition (A. 2), where we show 
below under the condition X £ X3, e £ Sf. and Aj ^ 2a, 1 ^ j ^ m, for some generic 
constant C > the following two bounds 



m 

Eir ~ ' 



- /U* ^ C E ^" {ll/3|| 2 IE||X|| 2 + a 2 } r, {\ 2 /(na) 2 + (1/n) + l}, (A.7) 
i=i J 

Ep m - P\\l ^ C{u m / lm + r,/n} ||/?|| 2 , (A.8) 
Consequently, for all j3 £ ^ and £ <5„, i.e., Aj ^ dvj (f and E||X || 2 dA, follows 



Ell 



m 

^C\d(d 2 /(na) 2 + l/n + l) ^ £L + u; m /j m + 1/n} 77 [pdA + a 2 }. 



nvj 
5=1 J 



Let m* and 5* be given by (3.1) for some A 1 then the condition on m and a, i.e., 
m = m* and a = (1/n) min(l, 7 TOt /(2dA)), implies 

EH/3-/3H 2 < Cd{d 2 /(na) 2 + l/n + l)A max(6*, 1/n) 77 [pd A + a 2 ]. 

because uJm/^lm = <5*, Y^JLi w j / ( n v j) ^ an( i Aj ^ 2a, 1 ^ j ^ m by using that 
^m* ^ 7m„/(wA) and (Aj)j^i £ <S^. Hence, from na ^ l/(2dA) follows the result. 
Proof of (A.7). By using Tn,j introduced in (A.l) we obtain the identity 

- SllS = E y. ■ • I VA/MA, > a}] . (A.9) 



i=l ' J 

By using the elementary inequality 1/2 |Aj/Aj — 1| 2 + |Aj/Aj| 2 it follows that 

|Aj/Aj| 2 l{Aj > a} < 2|2(A i /a) 2 |Aj/Aj - 1| 4 + 2|Aj/Aj - 1| 2 + l}. 
Therefore, by combination of the last estimate and (A.9) we have 

eii^-o 2 ^^.(ei^,/) 1 ^ 



3- 



= 1 ^ 



The estimate (A.7) follows now from (A. 10) and (A. 12) in Lemma A. 2. 
Proof of (A.8). Following along the lines of the proof of (A. 4) we obtain 

E||A„ - P\\l < 2{\\Pm ~ 0\\l + C( V /n) ||/3 m || 2 }, 

where under the condition Aj ^ 2a for each 1 j ^ m we have used that P(Xj < a) ^ 
Crj/n. Then, under Assumption 2.1, i.e., (ujj/jj) is non-increasing, the usual estimate 
1 1 An ~~ 0\\t ^ ^m/7m||/3|| 2 implies (A.8), which completes the proof. □ 
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Technical assertions. 

The following two lemma gather technical results used in the proof of Proposition 3.1, 
Theorem 3.2 and Theorem 3.3. 

Lemma A. 2. Suppose X G X^ m and e G £^ m , m G N. Then for some constant C > only 
depending on m we have 

sup(A7 m • E\T nJ \ 2m \ ^C-n-' m - {||/3|| 2m • (E||X|| 2 ) m + a 2m } ■ rj, (A.10) 

supE|Aj/Aj - l| 2m < C ■ n~ m ■ i]. (A.ll) 

ijf m addition w\ 2 and 7^2 ^ 1/2, t/ien we obtain 

sup P(Xj/Xj > wi) C-n- m -r] and sup P(Xj/Xj < w 2 ) ^ C • n~ m ■ rj. (A.12) 

Proof. Let Qj '■= Pi^u, i = 1, . . • , n and j G N. Then we have 
1 " 

r n j = — ^J{Cij + = : T\ + T 2 , 

i=i 

where we bound below each summand separately, that is 
A m 

E|Ti| 2m ^ C • -i • \W\\ 2m ■ (E||X|| 2 ) m • rj, (A.13) 
n m 

X m 

E\T 2 \ 2m ^ C ■ ■ a 2m ■ r, (A.14) 

for some C > only depending on m. Consequently, the inequality (A. 10) follows from 
(A.13) and (A.14). Consider T\. For each j G N the random variables (Cij'Xij), i = 1, . . . , n, 
are independent and identically distributed with mean zero. From Theorem 2.10 in Petrov 
[1995] we conclude E|Ti| 2m ^ Cn~ m E\(ijXij\ 2m for some constant C > only depending 
on m. Then we claim that (A.13) follows in case of T\ from the Cauchy-Schwarz inequality 
together with X\ G X^ m , i.e., sup^ E|Xij/-\/Aj| 4m ^ 77. Indeed, we have 

m 

Eici^-r < (E #r E • • • E E ^i 2m II \ x ^\ 2 < ii^i 2 ™ • A r • (E A <r ■ * 

'^J il^' 'm^j A:=l l^j 

Consider T 2 . (A.14) follows in analogy to the case of T\, because {a E\ Xij} are independent 
and identically distributed with mean zero, and E|cr • e\ ■ X\j\ 2m ^ o 2m ■ XJ 1 ■ rj. 

Proof of (A.ll). Since {(\Xij\ 2 / Xj — 1)} are independent and identically distributed with 
mean zero, and E\X 2 j/ Xj\ 2m ^ rj, the result follows by applying Theorem 2.10 in Petrov 
[1995]. 

Proof of (A.12). If w ^ 2 then P(Xj/Xj ^ w) sC P(\Xj/Xj - 1| > 1)- Thus applying 
Markov's inequality together with (A.ll) implies the first bound in (A.12), while the second 
follows in analogy, which proves the lemma. □ 

Lemma A. 3. Let G N and <5* be chosen such that (3.1) is satisfied for some A ^ 1. 
Consider a (infinite) vector b with components bj satisfying 

b 2 = -^, j G N, with ( := min (a 2 /(2d),p/A) , (A.15) 
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then we have for all j E N 



a 2 



fn * l ' L * / 9 \ / c* -i / \ 

E 6 H^> and E&fa^minf-^J ^T^" 

i=i i=l 

(A.16) 

Proof. The first inequality in (A.16) follows trivially by using the definition of £. Since 
by Assumption 2.1 the sequence (jj/ujj) is non decreasing the definition of m* given in 
(3.1) implies the second estimate in (A.16), i.e., 5Zj=i ^jlj ^ C(7m*A^m») Sj=i LO j/(. nv j) ^ 
CA $C p. To deduce the third inequality in (A.16) from the definition of m* and <5* observe 
that ET=i b 1^ = ^C(7m>mJE^W(^) > <£C/A and ^ > C/« since 

Ul/vx = 1, which proves the lemma. □ 

A. 2 Proofs of Section 4 
The mean prediction error. 

PROOF of Proposition 4.1. Given the eigenvalues (Xj) of T cov satisfy a link condition, 
that is (Xj) e d^i 1. It follows that E\\/3- /3\\l ~ d E(T cov (/3 - 0), f3 - (3). Therefore, 
we can apply the general results by considering the .F^-risk with u = v. Furthermore, in 
case (i) the definition of 7 = w p and v imply together (7m,/w m ») Y^?=i LO j/ v j = "i 2a+2p+1 . 
It follows that the condition on m* and <5* given in (3.1) of Theorem 3.2 can be rewritten 
as m* ~ n i/(2p+2a+i) and § * ^ ri -(2p+2a)/(2 P +2a+i) 5 respec tively. On the other hand, in 

case (ii) (7m 4 /w m .) Sj=i u; j/ v j = ' m * P+1 exp(m 2a ) implies that the condition on and 5* 
writes m* ~ (logn) 1 ^ 20 ) and <5* ~ n _1 (logn) 1 ^ 2a \ respectively. Consequently, the lower 
bounds in Proposition 4.1 follow by applying Theorem 3.2. □ 

Proof of Proposition 4.2. Since in both cases the condition on the dimension parame- 
ter m and the threshold a ensures that m ~ m* and a ~ 1/n (see the proof of Proposition 
4.1) the result follows from Theorem 3.3. □ 

The estimation of derivatives. 

Proof of Proposition 4.3. Due to E\\/3^ - (3^\\ 2 x (27r)2a E\\fi- /?|| 2 S , < s < p, 
we can apply again the general results by considering the .F^-risk with to = w s . In 
case (i) the well-known approximation Ej=i f ~ m r+1 for r > together with the 
definition of 7 = w p and t> implies (7 m ,/w m J UJ j/ v j ~ m 2a+2p+1 . It follows that 
the condition on m* and 5* given in (3.1) of Theorem 3.2 writes m* ~ n i/(2p+2a+i) 
and <5* ~ n ^( 2 P^ 2s )/( 2 P+ 2a + 1 ) ) respectively. On the other hand, in case (ii) by applying 
Laplace's Method (c.f. chapter 3.7 in Olver [1974]) the definition of 7 = w p and v imply 
(7m*/<^m«) X^mi UJ j/ v j ~ rn 2p exp(m 2a ). Therefore, the condition on m* and <5* can be 
rewritten as m* ~ (logn) 1// ( 2a ) and 5* ~ n -1 (log n) l K 2a \ respectively. Consequently, the 
lower bounds in Proposition 4.1 follow by applying Theorem 3.2. □ 

Proof of Proposition 4.4. Since in both cases the condition on the dimension parame- 
ter m and the threshold a ensures that m ~ m* and a ~ 1 jn (see the proof of Proposition 
4.3) the result follows from Theorem 3.3. □ 
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